World model -> tries to make assumptions about the system (Prior)
Data driven -> observes the outcomes of a system given an input (black box approach)

Quantities:
$P(X)$ - data/problem prior (input)
$P(T)$ - labels prior (output)
$P(X, T)$ - joint probability of inputs and outputs (world model)
Bayes: $P(T|X) * P(X) = P(X|T) * P(T) => P(T|X) = \frac{P(X|T) * P(T)}{P(X)}$ $P(X|T)$ - Probability of given data in the labels space
$P(T|X)$ - likelihood function of labels in the data space

Whiteboard representation of the idea :)

Approach 1: Null prior (ignore prior approximation)

$$ P(T|X) \approx P(X|T)*P(T) $$

Basically, what this tells us is that we can compute the likelihood function (output probability given by data inputs $P(T|X)$ ) by the product of the world model function, under the assumption of target labels ( $P(X|T)$ ) and the distribution of the labels ( $P(T)$ ). The second function is assumed balanced or representative of the data or real world problem the data tries to model. This can mean that we may be have to use data balancing and other techniques. The first quantity, though, is the most important part, and that is how do we model the world.

Classical ML simply takes this problem of approximating $P(T|X)$ and throws data at it by making some assumptions. The main assumption it does is that the world can be described as a complex Gaussian, thus the problem can be solved via maximum likelihood principle (or negative log likelihood as is done in practice due to tractability).

$$ P(T|X) = \prod_{(x,t)\in(X,T)}{P(t|x,w}) $$, w - parameters of world model

In other words: Throw as much $(x,t)$ data at me and assume the model can learn via maximum likelihood principle, ignoring the fact that we are modeling the world as a probability distribution. The questions that arises are: Is this enough? Isn't the world just a big probability distribution? Is it Gaussian?

Approach 2: Null data
This is classical mathematics and tries to model perfect system behaviour through physics. However, most real life problems are simply not understood very well or may not even have good enough physics-based approximations.
How do you define the world of $X=(H \times W \times 3), t=(2, )$ where X is the RGB space for some predefined heights and widths (for simplicity) and t is the targets space of Cats and Dogs ?

It's so much easier to solve this real world problems through data. The set of interesting problems solvable by this approach is limited by the simplicity of the model and our human understanding of the problem. However, there are advantages:
We can run simulations for some given input state $X(t_0)$ and see the behaviour of the world after some time. This is the chaos theory domain, where there are some stable systems and some unstable systems, and there is little understanding behind the physics of the later ones. The first ones are also interesting. How do we know a system is stable? How do we guarantee that we model our real life problem using a stable system such that our outputs are explainable without chaos theory assumptions (i.e. they are deterministic)?

Approach 3: Hybrid model
Like everything in life, a approach that takes into account both sides (black box and perfect physics) is desired. In this case the problem becomes an iterative one:

Model the world as good as we can by our current understanding of the problem (including null prior assumption). It's important to define good metrics/benchmarks that are comparable between iterations

Use the data we have and we know it's reliable (ignoring human bias, labeling error etc.)

Get a world model using the data and the initial assumptions

Shape the world model via some sort of generative process. This can be simulations that try to match some test/validation set or simply adding more constraints based on our new understanding of the world. Obviously, the first one is desired, since it's more automatic, but sometimes, going back to the blackboard and reshaping the initial understanding of the world is the best approach.

Repeat How do we add these world constraints and understanding to our data-driven model?

Regularisation -> Constraints in the world model space (we know weights should behave like this)
Inductive bias -> Constraints in the input space (we know images behave like this, so we can use local information to speed up learning, such as Conv2D)
Domain knowledge -> Constraints in the input or output space via real world metrics. We know the domain "works" like this.
- Commerce: demand decreases as price increases $$ lim_{p->\infty}{f(p)} = 0 <=> \frac{d}{d_p} * f(p) \le 0 $$
- Velocity increases as heat increases -> Usually in the domain of physics, we tackle problems that are simple enough to be represented in closed form. For example, we know the initial state of some environment and positions of some atoms. We can model the interactions between them (ignoring quantum physics) and see, after we run the simulation at some time $$t > t_0$$ what is the state of the environment. This is useful in situations where we cannot compute f(t), but we can iteratively simulate the behaviour and this gives us a much faster result than solving it analytically.